NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fast and Efficient Scaling for Microservices with SurgeGuard

https://doi.org/10.1109/SC41406.2024.00103

Ghosh, Anyesha; Yadwadkar, Neeraja J; Erez, Mattan (November 2024, IEEE)

The microservice architecture is increasingly popular for flexible, large-scale online applications. However, existing resource management mechanisms incur high latency in detecting Quality of Service (QoS) violations, and hence, fail to allocate resources effectively under commonly-observed varying load conditions. This results in over-allocation coupled with a late response that increase both the total cost of ownership and the magnitude of each QoS violation event. We present SurgeGuard, a decentralized resource controller for microservice applications specifically designed to guard application QoS during surges in load and network latency. SurgeGuard uses the key insight that for rapid detection and effective management of QoS violations, the controller must be aware of any available slack in latency and communication patterns between microservices within a task-graph. Our experiments show that for the workloads in DeathStarBench, SurgeGuard on average reduces the combined violation magnitude and duration by 61.1% and 93.7%, respectively, compared to the well-known Parties and Caladan algorithms, and requires 8% fewer resources than Parties
more » « less
Full Text Available
Reducing Load Latency with Cache Level Prediction

https://doi.org/10.1109/HPCA53966.2022.00054

Jalili, Majid; Erez, Mattan (April 2022, Proceedings of the 2022 IEEE International Symposium on High Performance Computer Architecture (HPCA))

High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is requested by load instructions. However, data prefetching has shown to be imperfect in many situations. We propose cache-level prediction to complement prefetchers. Our method predicts which memory hierarchy level a load will access allowing the memory loads to start earlier, and thereby saves many cycles. The predictor provides high prediction accuracy at the cost of just one cycle added latency to L1 misses. Level prediction reduces the memory access latency by 20% on average, and provides speedup of 10.3% over a conventional baseline, and 6.1% over a boosted baseline on generic, graph, and HPC applications.
more » « less
Full Text Available
Dynamic Generation of Python Bindings for HPC Kernels

https://doi.org/10.1109/ASE51524.2021.9678726

Zhu, Steven; AlAwar, Nader; Erez, Mattan; Gligoric, Milos (November 2021, Dynamic Generation of Python Bindings for HPC Kernels)

Full Text Available
HeMem: Scalable Tiered Memory Management for Big Data Applications and Real NVM

https://doi.org/10.1145/3477132.3483550

Raybuck, Amanda; Stamler, Tim; Zhang, Wei; Erez, Mattan; Peter, Simon (October 2021, Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles)

Full Text Available
Parla: a Python orchestration system for heterogeneous architectures

https://doi.org/10.1109/SC41404.2022.00056

Lee, Hochan; Ruys, William; Henriksen, Ian; Peters, Arthur; Yan, Yineng; Stephens, Sean; You, Bozhi; Fingler, Henrique; Burtscher, Martin; Gligoric, Milos; et al (November 2022, SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis)

Python's ease of use and rich collection of numeric libraries make it an excellent choice for rapidly developing scientific applications. However, composing these libraries to take advantage of complex heterogeneous nodes is still difficult. To simplify writing multi-device code, we created Parla, a heterogeneous task-based programming framework that fully supports Python's scientific programming stack. Parla's API is based on Python decorators and allows users to wrap code in Parla tasks for parallel execution. Parla arrays enable automatic movement of data between devices. The Parla runtime handles resource-aware mapping, scheduling, and execution of tasks. Compared to other Python tasking systems, Parla is unique in its parallelization of tasks within a single process, its GPU context and resource-aware runtime, and its design around gradual adoption to provide easy migration of and integration into existing Python applications. We show that Parla can achieve performance competitive with hand-optimized code while improving ease of development.
more » « less
Full Text Available
Compresso: Pragmatic Main Memory Compression

https://doi.org/10.1109/MICRO.2018.00051

Choukse, Esha; Erez, Mattan; Alameldeen, Alaa R. (October 2018, International Symposium on Microarchitecture)

Today, larger memory capacity and higher memory bandwidth are required for better performance and energy efficiency for many important client and datacenter applications. Hardware memory compression provides a promising direction to achieve this without increasing system cost. Unfortunately, current memory compression solutions face two significant challenges. First, keeping memory compressed requires additional memory accesses, sometimes on the critical path, which can cause performance overheads. Second, they require changing the operating system to take advantage of the increased capacity, and to handle incompressible data, which delays deployment. We propose Compresso, a hardware memory compression architecture that minimizes memory overheads due to compression, with no changes to the OS. We identify new data-movement trade-offs and propose optimizations that reduce additional memory movement to improve system efficiency. We propose a holistic evaluation for compressed systems. Our results show that Compresso achieves a 1.85x compression for main memory on average, with a 24% speedup over a competitive hardware compressed system for single-core systems and 27% for multi-core systems. As compared to competitive compressed systems, Compresso not only reduces performance overhead of compression, but also increases performance gain from higher memory capacity.
more » « less
Full Text Available
CompressPoints: An Evaluation Methodology for Compressed Memory Systems

https://doi.org/10.1109/LCA.2018.2821163

Choukse, Esha; Erez, Mattan; Alameldeen, Alaa (July 2018, IEEE Computer Architecture Letters)

Full Text Available

Search for: All records